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ABSTRACT 

The purpose of this study was to compare five methods 
of computing school effectiveness indices (SEIs) from longitudinal 
data. The methods were within-schcol regression, wi thin-school 
regression corrected for the unreliability of measurement, mean 
difference scores, average individual residual scores, and school 
residual scores. The sample consisted of 3,769 third-graders from 70 
elementary schools in the Midwest. The raw data consisted of Total 
Reading scores from the Metropolitan Primary II Achievement Test 
administered in fall 1970 and spring 1971. While the various school 
effectiveness indices differed from one another and in their 
correlations with other variables, little evidence could be found for 
the lack of validity of any school effectiveness index. Further, all 
of the school effectiveness indices were highly stable across 
samples, except for the indices for initially high-scoring students. 
Finally, predictions from nonlongitudinal data furnished reasonable 
estimates of school effectiveness as measured by one of the indices. 
(Author/CK) 
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A COMPARISON OF SELECTED SCHOOL EFFECTIVENESS 
MEASURES BASED ON LONGITUDINAL DATA 
Gary L, Marco 
Educational Testing Service 

Abstract 

The purpose of this study was to compare five methods of computing 
school effectiveness indices (SEIs) from longitudinal data. The five 
methods were within-school regression, within-school regression corrected 
for the "unreliability of measurement, mean difference scores, average 
individual residual scores (based on the regression of student output 
scores on student input scores), gtnd school residual scores (based on the 
regression of school mean output scores on school mean input scores). The 
sample consisted of 5^769 third-graders from 70 elementary schools in the 
Midwest. The raw data consisted of Total Reading scores from the Metropol- 
itan Primary II Achievement Test administered in Fall 1970 and Spring 1971* 

While the variou? school effectiveness indices differed from one another 
and in their correlations with other variables, little evidence could be 
found for the lack of validity of any school effectiveness index. Further, 
all of the school effectivtaess indices were highly stable across samples, 
except for the school effectiveness indices for initially high-scoring 
students. Finally, predictions from nonlongitudinal data furnished 
reasonable estimates of school e^^fectiveness as measured by one of the 
school effectiveness indices. 

The methods should be tried out at other grade levels. Further, the sta- 
bilities of the various school effectiveness indices across years should 
be studied. 



A COMPARISON OF SELECTED SCHOOL EFFECTIVENESS 
MEASURES BASED ON LONGITUDINAL DATA"^ 
Gary L. Marco 
Educational Testing Service 

With the recent emphasis in- education upon program budgeting and cost 
effectiveness has come a renewed interest in school system evaluation. 
However, how school effectiveness should be estimated is unclear. The 
purpose of this study is to compare selected methods of estimating school 
effectiveness from longitudinal data. 

Various techniques have been suggested to generate school effectiveness 
indices. Indices commonly used are the averege performance of students in 
a particular grade in the school and the difference between the performance 
of students in the school and the performance of a national norm group. 
Although these two methods have been widely used, they have a fatal flaw: 
neither takes into account the differences in initial status. 

In some studies partial control over differing input levels has been 
achieved by holding socioeconomic status (SES) constant. Schools serving 
students from low SES families have been compared with one another, as h-ive 
schools serving students from more advantaged families. The school effective- 
ness index in such a case is the deviation of performance from the average 
of the schools serving like children. This index is often employed with 
data collected at one point in time for a given grade level, such as state- 
wide testing program data. Ability scores have sometimes been partialed 
out of achievement scores in an attempt to control for initial differences. 
In this case, the difference betv;een the actual performance and the predicted 
performance has been used as a measure of school effectiveness. Unfortunately, 
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the distinction betvreen ability and achievement, i^ vinclear operationally, 
so that partialing out ability also partials out some of the valid school 
variance. In other studies cross-sectional data have been used to estimate 
school effectiveness indices. These data are useful for estimating school 
effectiveness only if it is assumed that students in the lower grade are 
nov; performing on the average at the same level as students in the higher 
grade did at the lower grade level. The difference in the means of the 
tv/c groups has been used as a measure of effectiveness. 

Longitudinal data have been recognized as the sine qua non of good 
evaluation in nonexperimen"*-al settings (see Dyer, Linn, & Patton, 1969; 
Hilton & Patrick, 1970). Longitudinal data may be available at the school 
level (for example, third grade arithmetic mean and sixth grade arithmetic 
mean three years later) or at the student level. Unless the student group 
enrolled in a lower grade has remained intact over the interim period, the 
school data will be based on a group that is somewhat different from the 
group of students that was present at both data collection points. To 
distinguish these "unmatched" groups from groups that are composed of the 
same students, the former has been called an "unmatched-longitudinal 
sa^nple" and the latter a "matched-longitudinal sample" (Dyer et al., 1969). 

Dyer, Linn, and Patton compared school effectiveness indices based on * 
a matched-longitudinal sa:nplo, an unmatched-longitudinal sample, and a 
cross-sectional sample, and concluded: 

Although it seems apparent that the use of discrepancy 

measures raises a great many problems needing further 

research, it is also apparent, from the present study, 

that such measiires when based on matched-longitudinal 



samples are the ones most likely to provide valid 
measures of system effectiveness [1969, p. 605]. 
There are a number of methods of generating school effectiveness 
indices that- could be used with longitudinal data chat were not used in 
the jyyer study, 'fhe p^irposes of the present study were (a) to compare the 
school effectiveness indices generated by the selected methods, (b) to estimate 
the stability of the school effectiveness indices acrosjs samples, and (c) 
to assess the adequacy of using nonlongitudinal data for predicting school 
effectiveness indices obtained from longitudinal data. Three sub-studies 
were conducted to accomplish these purposes. 

General Procedures 

The general procedures for the study are outlined in this see"*" ion. 
Procedures specific to the sub-studies are discussed in the three following 
sections . 

Sample . The schools in the sample consisted of 70 elementary schools 
that participated in a 1970-71 ESEA Title I statewide evaluation study con- 
ducted in the Midwest. The students in the sample were those third-graders 
who took a pretest in the Fall (November primarily) of 197C and a posttust in 
the Spring (May primarily) of 1971. Only those students tested in both the 
Fall and the Spring were included in the study sample. A total of U,778 
students were tested at least once; 5,769 students (79^) were tested both 
times. The sample sizes for the schools in the study ranged from 17 to 152. 

Instruments . Forms F and G of the Primary II Reading Test of the 
Metropolitan Achievement Tests were used for the study. The Primary TI 
Reading Test is appropriate for second- and third-graders. Since the third- 
graders in the sample, being students in Title I and comparison schools. 



were assumed to be lower achievers thsui the average third-grader, the 
Primary II Reading Test was administered to ensure that the test material 
would not be too difficult for the students. Total Reading standard scores 
from the two administrations were used as the data base for the study. 

The appropriaiseness of the Primary II Reading Test for the sample is 
indicated by the below-average pretest means of the sample. The pretest 
mean i^eading score for the 5^769 students was 51.82, which corresponds to 
a g-ade equivalent score of 2.6. Thus, the study sample wa^ on the average 
about six months behind the z:iorm for students in the second month of their 
third year. 

Variables . Information on a number of variables was used in the study. 
Th^^se are listed in Table 1. Hecordb uf the State Department of Education 

Insert Table 1 about here 

furnished information on Variables 1-9* A questionnaire about Title I 
reading programs yielded data on Variables 10 and 11, while Variables 12-15 
were obtained directly from the Metropolitan Achievement Test. 

Since schools tested on different dates, the nxmfiber of days between 
the pretest and the posttest was not the same from school to school. The 
number of weekdays between testings ranged from 119 days to l60 days. 
Variable l6. Weekdays between Pretest and Posttest, was derived from the 
testing dates. 

The staff of each school was asked to identify those students partici- 
pating in an ESEA Title T reading program. This information, coupled with 
the number of students taking both tests, defined Variable IT, Percent of 
Students Participating in Title I Reading. 



student stability also varies from school to school. To obtain a 
"stability" index for the school, the total number of students who took both 
tests was divided by the total number of students tested, the assumption 
being that those students not tested twice transferred either in or out 
during the year. This variable was Percent of Students Taking Both Tests 
(Variable l8). 

Variables 19-50 were those associated with the various school effective- 
ness indices that were derived. Variables 19-21, 25-26, 28, and 50 were of 
primary interest in Sub-studies I and II. Variables l-l8, 22, 27, and 29 
were useful in interpreting the school effectiveness indices in Sub-study I. 
Variables 1-9 and Variable l8 were used as predictors in Sub-study III. 

Methods of egtijr.atin^ school effectiveness . Five methods were 
used to compute school effectiveness indices. These were as follows: 

1. Within-school regression* For each school the regression line 
describing the relationship betv;een individual student pretest 

and posttest scores v/as computed and posttest values were estimated 
at reference points for low-, middle-, and high-scoring students. 
That is, for a given reference point, , 

SET = AXq + Y - AX 

= Y - A(X - Xq) 

where A is the least squares estimate of the within-school slope; and 
Y and X are the school means on the posttest and pretest, 
respectively. 

2. Corrected within-schoo!i regression. Same as 1, except the slope 
and intercept were cox^rocted for the unreliability of the pretest 



measure on the basis of data reported by the test publisher. 
Symbolically, 

SEI = ^ X + Y - X 

where A , > ^ , and X are defined as in Method 1; ajid K^-^ 
is the pretest reliability estimate for the school. 
Mean difference scores. For each school the mean difference 
score (posttest score miniis pretest score) was computed; thus, 

SEI = Y - X . 

Individual residual scores. Individual student posttest scores 
were regressed on individusuL student pretesi: scores for the 
total sample of students across schools an n individual 
residual scores csuLculated for each school. In this case, 

SEI = i 2 [y. - (BX. + Y - KC)] 
N . 1 1 ^ 

1 

= Y - [Y - B(X - X)] 

where N is the nvunber of students in the school taking both 
tests; Y^ and x^ are the posttest and pretest scores, 
respectively, for individual i ; B is the least squares 
estimate of the slope for the students across all schools; 
Y and X are the grand posttest ai?d pretest means, respectively; 
and Y and X are the posttest and pretest means for the school. 



5- School reaidual scores (pretest model)* School posttest mears 
were regressed on school pretest means and school residual 
scores calculated. This is the one-predictor-vai'iable equivalent 
of the method suggested by Dyer (l97l) as a measure of school 
effectiveness • Here 

SEI = Y - (CX + Y' - CX') 

= Y - [f' - C(X' - X)] 

vhere Y and X are posttest and pretest means, as previously 
defined; C is the least squares estimate of the regression 
slope of the school posttest means on the school pretest means: 
and Y* and X' are the unweighted averages of the school post- 
test and pretest means, respectively, across all schools. 
It should be pointed out that the school effectiveness indices derived 
from the five methods are not comparable in the absolute sense. The 
relative positions of the schools on the various school effectiveness 
indices may be compared, however. 

Most of the methods are strai{jhtforv;ard computationally. However, an 
elaboration of the first tv;o models is in order, since these models differ 
from those that have been used to estimate school effectiveness. 

The first two models are similar to analysis of covariance except that 
no assumption is made about the equality of slopes from school to school. 
This assumption about slopes is particularly restrictive v;hen ona deals 
with existing groups, such as the students in schools. Model 1 relates 
tiie observed posttest score to the observed pretest score. This procedure 
is like the one Rock, Baird, and Linn (1972) used to estimate college 
effectiveness. 
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Often one is interested in compering treatment groups not foimed at 
random, as in the case where schools are being compared, Cronbach and 
Furby (l970) indicated th&t the findings of such a study can usefully be 
summarized by calculating the within-group regression equations relating 
true status on the posttest to true status on independent variables. Model 
2 does essentially what Cronbach and PXirby recommended; the slope and inter- 
cept of the regression line relating the posttest to the pretest is corrected « 
for the unreliability of the pretest measure. 

In each of the first two models, it is assumed for purposes of the 
present study that a straight line best describes the relationship between 
the pretest scores and the posttest scores for a given school. It is not 
assumed that the regression lines are the same from school to school, for 
a school may be more effective for one type of student than another. In 
Figure 1 it may be notec' that School A appears to be the most effective for 

Insert Figure 1 about here 

hig}i-scoring students, while School B appears to be the most effective for 
low-scoring students. It is obvious in this case that a single school 

I 

effectiveness index is a misleading index, since it does not indicate the 

fact that schools are differentially effective for students of differing ♦ 

abilities. The school effectiveness index depends upon which pretest score 

is selected as the reference point. 

When only three schools are considered, as in Figxire 1, a reference 
point need net be selected; the graph itself is an adequate description 
of school effectiveness. However, as the number of schools increases, a 
graph of the lines becomes very messy; and it is necessary to resort to a 

ERIC 



nonr^raphical procedure for describing school offectivoness. If l.ie slopes 
of the re(^rcssion lines were equal, then one couJ.d choose any arbitrary 
point on the pretest score scale and r;oinpare predicted posttcst scores. 
The schools would maintain the same ordering, no matter which points were 
chosen. However, if the slopes were not equal, as is likely to be the (^ase 
if existing group? are studied, the selection of a reference point is 
crucial. 

For purposes of the present study, it was decided to select ri^feroncc 
points to represent low-scoring, middle-scoring^, and high-;icorin('; students 
and to compute a school effectiveness index for each group, as shown in 
Fi(5ure 1. The points selected were the mean pretest scores across ai] 
s^'hools and those points located on-i standard deviation above and belcw the 
mean; j*ame3y, ^+0.88^^2, ';>1.8201, and 6p.7t>10. The school effectiveness 
indices are 'estimates of the mean posttost scores for individuals having 
these fixed pret.est scores. V/hile ori^* could srubtract the fixed point from 
the estimated moan to obtain a "frov/th" school effectiveness index, the 
estimated moan works just as well and is used in this study as the school 
effectiveness index. 

In Model C', corrections for the unreliability of measurement were made. 
It is v;ell Imown that test unreliability results in an underestimate of the 
slope of Y on X (see, fcr exajnple, Snedecor ^ Cc.^hrari, 1967)* The 
sample regression coefficient, as Snedecor ai'id Coohran indicate, provides 
an unbiased estimate of A*(R ) , where A* is the true slope and K 

XX XX 

is the reliability of the predictor variable for group X . Thus, A* can 
be estimated by dividing A f the observed slope, by the reliability of the 
tost. 
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The problem of bias due to measurement error is most serious when the 
posttest scores ax'e estimated from regression lines determined on groups 
that have widely disparate pretest means. Suppose that two groups had the 
same observed slopes and intercepts, but differed only in mean performance 
on the pretest and posttest, as illustrated in Figure 2. For any selected 

Insert Figure 2 about here 

reference point on X , the estimated value of Y would be exactly the same 
for the two groups. Suppose, however, that the slopes (and intercepts) were 
corrected for measurement error* They would then appear somev/hat as shown 
by the dotted lines. The expected value of the group with the lower mean 
would be higher for any reference point, say . In this case, the 
estimated values computed on the basis of the observed slopes and intercepts 
would be biased against the lower scoring group — a phenomenon that will obtain 
whenever the slope is positive. The adjustment of the slopes requires the 
use of the test reliability. 

In this study test reliabilities for each school were not available; 
they had to be derived. If it is assumed that "che error variances of two 
groups are the same, then the following formula (Gulliksen, 1950, p. Ill) 
can be used for estimating the reliability of test scores for Group 2 from 
the reliability of Group 1 scores: 

^1 

= ^ - ;2 (1 - 

^2 

2 

where S^^ is the variance of Group 1 on the test, 

^ is the reliability of the test for Group 1, 



2 

is the variance of Group ? on che test, and 
^22 reliability of the test for Group 2. 

But S^(l - R^^) is the variance error of measurement. Thus, one can 
estimate the reliability of the pretest for a particular school if one 
knows the Total Reading variance error of measurement for the Metropolitan 
standardization sample and the variance of Total Reading scores for the 
school. 

The standard error of measurement for the standardization sample on 

Total Reading was 1.9 (Durost, Bixler, Wrights tone, Prescott, & Balow, 

2 

1971); the variance error of measurement was thus (1.9) or 3.6l. Hence, 
the pretest reliability for a given school was estimated from the formula: 

R = 1 - — TT^ . 

X 

The estimated true slopes and intc^^roepts for a given school were 
computed as follov;s: 

R 

XX 

B' = Y - A'X 

v/here A' is the corrected slope, 
A is the observed slope, 
R is the estimated pretest reliability, 

XX 

B' is the, corrected intercept, 

Y is the posttest mean, and 

X is the pretest mean. 
The corrected school effectiveness index for a given reference point, 
was computed by the formula A'Xq + B' . 
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Sub-Study T: Comparison of School Effectiveness Indices 

Procedures and Resxilts 

The first sub-study involved a comparison of the school effectiveness 
indices generated by the five methods, fhc school effectiveness indices 
v/ere computed according to the methods already outlined for each of the 
70 schools in the sample. Regression coefficients and o^her descriptive 
information for the schools that had the highest and lowest school effective^- 
ness indices > as estimated from the corrected v/ithin-school regression lines, 
are reported in Table 2. The regression coefficients for the two other 

Insert Table 2 about here 

models using regression lines to generate the school effectiveness indices 
are shown in Table 5. It may be noted that the slope for the School Residual 

Inser'^' Table 5 about here 

(Pretest) method is very close to one. Th\is, school effectiveness indices 
generated using this model (observed mean minus estimated mean) v;ill 
necessarily be highly correlated with Mean Difference Scores. 

Intercorrelations among the school effectiveness indices derived from 
the five methods are reported in Table ^. The intercorrelations among the 

Insert Table h about here 

nine school effectiveness indices were factor analyzed by the Minres method 
(Harman, 196?). The residual correlations were negligible after three 



factors had been extracted. The three derived factors rotated according 
to the normalized varima^ criterion are shown in Table 

Insert Table ^ about here 

The other 21 variables identified in the section on Variables were 
correlated with the school effectiveness indices to provide- the basis for 
comparing the methods. These correlations are givi^n in Table 6. 

Insert Table 6 about here 

In Tables k and 6 a correlation of ±.255 is significantly different 
from zero at oc = .0^. A difference of +.1^ in the correlations of any two 
school effectiveness indices v;ith a third variable is si^Tnificant at oc < .0^ 
if the correlation between the school effectiveness indices is at least .80. 
(This difference is conservative, for it assumes a multiple correlation of 
zero between a weir^hted combination of the school effectiveness indices and 
the third variable. See Dubois, ly(-^'), p. 5-^9; for the exact test.) 

Discussion: Direct Comparisons oT the Estimates 

V/ith respect to the correlallons shown iri Table it may be noted, 
first of all, that the corrected s^-hool ^-f'^^otlveness indices correlated 
nearly perfectly with their correspondinr uncorrected school effectiveness 
indices (19 vs. 25, 20 vs. 2h, 21 vs. 25). Thus, in this study the correction 
for the unreliability of the pretest made little difference. This effect is 
not surprising in view of the fact that the Total Reading score on the 
Metropolitan Reading Test had a reliability of ,?7 for the national norm 



ri'oup. In this study, then, Variables 25,* ^'i, and P'; were virtually 
interchangeable v/ith Variables ]-^, ?0, and 21, respectively, 

S'-^oondly, the Mean Differ-.^nce Scores (Variable 28) correlated nearly 
perfectly (r = .996) v/ith the School Residual (ProLest) School EfTective- 
ness Indices (Variable 50). 'fhi^ result, too, is not siirprisin.^: in viov; 
or r/ne fact that the slope for the School Residual (Pretest) School 

ffpcti veriess Index v/as .9^' (see Table*-^). If the slope for the regression 
of school mean output or school mean input v;ere 1,00, the two variables 
would have been perfectly correlated. 

Thirdly, the individual Residual School f f fectiveness Indices were more 
highly correlated with the School Residual (Pretest) School Effectiveness 
Indices (r = .96) than any other type of school effectiveness index. Both 
of these methoas utilize as school effectiveness indices deviations about 
the regression line of a reference group. A school's school effectiveness 
Index is the difference between the observed school posttest meaii and the 
predicted school posttest mean. Although the regression coefficients for 
uhp two models wer ^ different (see Table 5), apparently the higher intercept 
for the Individual Residuals compensated enough for the lower slope to yield 
predicted school posttest means that were similar to those computed from the 
School Residual regression coefficients. 

Fourthly, assuming the validity of the Within-School Regression 
Corrected School Effectiveness Index, the schools were differentially 
effective for low-, middle-, and high-scoring students. The correlation 
of the school effectiveness indices for low-scoring students (Variable 23) 
with the school effectiveness indices for middle-scoring students (Variable 
?k) v;as .79, but the correlation of Variable 25 with the school effectiveness 
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indices for high-scoring students (Variable 25) was only .25. Thus, the 
rank ordering of the schools changed suDstantially with the ability of the 
students. On the oasis of the current data, it must be concluded that in 
general a single school effectiveness index for a school is not an accixrate 
description of the effectiveness of the school for students at all ability 
levels . 

Finally, of the school effectiveness indices not computed from wi-^hin- 
school regression (Variables 26, 28, and 50), the Individusil Residusil School 
Effectiveness Indices (Variable 26) correlated highest with the various 
within-school regression (corrected and uncorrected) school effectiveness 
indices. The correlation of the Individual Residual School Effectiveness 
Indices with the corrected school effectiveness indices for middle-scoring 
students (r = .95) was considerably higher than the correlations with the 
corrected school effectiveness indices for low- and high-scoring students 
(r's • .77 and .75, respectively). Figure 5 shows the relation between 

Insert Figure 3 about here 

Variables 2k and 26. It may be noted that the discrepancy between the two 
methods increased as the school effectiveness indices increased. Perhaps 
the within-school regression lines for "high-scoring" schools deviated more 
from the total individual regression line than did the lines for "low-scoring" 
schools. This and other hypotheses should be explored with new samples. 

Discussion; Factor Analytic Results 

The data in Table 5 indicate that three dimensions are necessary to 
account for the intercorrelations among the school effectiveness indices 
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and further demonstrate that the various methods yield different estimates 
of school effectiveness. Factors I and II represent school effectiveness 
for low-scoring "and high-scoring students, respectively. Factor III 
separates the Mean Difference Scores and the School Residual (Pretest) 
School Effectiveness Indices from the other types of school effectiveness 
indices. The Within-School Regression (corrorled and uncorrected) School ETfoc- 
tiveness Indices (Middle-Scoring Students) and the Individual Residual 
School Effectiveness Indices have moderate loadings on all three factors. 
These factor-analytic results lend support to the claim that one school 
effectiveness index is insufficient for summarizing school effectiveness 
and to the claim that Mean Difference Scores and School Residuals provide 
estimates of school effectiveness that are different from those provided 
by the other methods. 

Discussion; ' "Meanings" of the Estimators 

The correlations of the school effectiveness indices with Variables 
I-I8, 22, 27, and 29 (see Table 6) provide a basis for interpreting the school 
effectiveness indices. In terms of their correlations with another variable, 
the variables were, with few exceptions, ordered (high-to-low or low-to-high) 
as follows: Variable 2k (or Variable 20), Variable 26, Variable 50, and 
Variable 28. Differences of + .15 or more existed between at least one pair 
of school effectiveness indices in the correlations with Variables 1, P, 5, 
4, 6, 10, 12, 15, Ik, and 18. Particxilarly striking are the differences in 
the correlations of the school effectiveness indices with Variables 12, iky 
3, 2, and 6. 

Several of the school effectiveness indices have relatively high corre- 
lations vrith the Total Reading pretest means. The correlations of the 



Within-School Regression Corrected School Effectiveness Indices with the 
pretest means ranged from .19 to .38. The Individual Residual School 
Effectiveness Indices had a correlation of .28 vath the pretest means. Of 
course, the School Residual (Pretest) School Effectiveness Indices had no 
relationship to the pretest means, since they v^ere computed by parti aling 
out the pretest means. The Mean Difference Scores correlated negatively 
(-.10) with the pretest mesuis. 

O'Connor (1972) claimed that a method using school residuals (cf. 
Variable 30) is preferable to a method using mean individual residuals 
(cf. Variable 26). His argonien': vn; ba.'ed on the assumption thai: mean 
student input and residuals should be imcorrelated. However, the "true" 
correlation of school effectiveness and the initial input of students, 
while unknown, might well be positive. V/ealthier school districts, which 
frequently have better facilities and a more experienced, highly trained 
staff, usually serve higher achieving students. If the schools in such 
districts were more effective, given equal student input, one would expect 
a positive correlation between student input and effectiveness and input 
to be zero. 

' The possibility must be entertained that the higher school effective- 
ness indices of schools serving students of higher initial achievement 
levels were due to the lack of control over relevant variables. Campbell 
and Erlebacher (l97l), in discussing ex post facto evaluations of compen- 
satory education, pointed out: "...The usual procedures of selection, 
adjustment, and analysis produce systematic biases in the direction of 
making the compensatory program look deleterious [p. 185] . " Barnow (l972), 
however, demonstrated that under certain conditions ex post facto analysis 
does not lead to bias. 
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The negative correlations of Mean Difference Scores with Pretest Mean 
viere not unexpected. It is well known that difference scores tend to be 
negatively correlated with initial scores (see, for example, Thorndike, 
1966). Such a condition would result in a negative correlation between 
the school effectiveness indices and the Pretest Means. This condition 
is undesirable in that it produces results biased in favor of initially 
lower-scoring groups, and it is for this reason that an attempt is made to 
control initial status. It should be pointed out that sometimes this bias 
in difference scores can lead to an unbiased measure of effectiveness--when 
it counterbalances the bias from other sources (Cajnpbell & Erlebacher, 1971)* 

Tlie correlations of the school effectiveness indices with the Total 
Reading posttest means are even higher than those with the pretest means. 
This result is to be expected whenever there are treatment effects (in 
this case, school effects), for the posttest scores reflect the treatment 
effects as well as the initial, achievement levels. 

The correlations of the Within-School Regression (corrected and 
uncorrected) School Effectiveness Indices (Middle-Scoring Students) and 
the Individual Residual School Effectiveness Indices with Percent Non- 
V/hite (Vai'iable 5) are much lower than those for Mean Difference Scores 
and School Residual (Pretest) School Effectiveness Indices. All of the 
correlations were negative. The same pattern is true of Lhe correlations with 
K-12 District Current Operating Expense per Pupil (Variable 2). However, the 
correlations in this case wero positive. Thus, Mean Difference Scores and 
School Residual (Pretest) School Effectiveness Indices tended to be less 
correlated (in an absolute sense) with the racial composition of the schools 
?md more correlated with cost per pupil. The correlations with Percent of 
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Teachers with Five or More Years Experience (Variable 6) were close to 
zero. Those of the Within-School Regression (corrected and worrected) 
School Effectiveness Indices were slightly positive, while those for Mean 
Difference Scores and School Residual (Pretest) School Effectiveness 
Indices were slightly negative. 

Sijjmnary 

This analysis indicates that the school effectiveness indices generated 
by the five aifferent methods differ and have somewhat different correlation 
patterns with other variables. However, v/hich of the school effectiveness 
indices best approximates "true" school effectiveness over one academic 
year is not known. There is a need for a study of schools of known quality, 
so that a determination of the validities of the various, school effective- 
ness indices can be made. 

Sub-study II- Stability of the Estimates 

In the preceding section, differences among the various estimates of 
school effectiveness v;ere pointed out. One factor that shoxild be considered 
in choosing a method of estimatin^^ school effectiveness is the variance, 
or mean square errors, of the estimator. A biased estimator may be useful 
if it is not "too" biased and if the estimates vary little from one sample 
to another. In this section evidence concerning the stability (reliability) 
of the various estimates of school effectiveness is presented. 

Existing statis-^-ical theory could have been used to derive estimates 
of the reliabilities. Hov;ever, since such e£;tjjnates are based on normal 
distribution theory, which may not apply here, an empirical determination 
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of UiC xcliabilities of the estimators was made. As in the previous sub- 
stucly, Total Rcad1n{^ scores v;erc used. 

Proc'C'diu'es arid Results 

The sample from each of the schools in the study was divided into 
rsjidom halves by use oV the Tauswurthe random number Generator for the 
TJ'-M 'jbO (v/hittlesey, 1968). Then ea«"h of the five methods of computing 
school of ftjctiveness indices v;ere used to estimate school effectiveness 
for each sample. Ten school effectiveness indices were thus available for 
each school. (These correspond to Vc^riables iy-?l, ^5-J?6, 28, and 50, 
identified in Table 1.) 

The variances of the esti-mators coxild not be compared directly because 
thi; scales differed from one set of school effectiveness indices to another. 
Tlierefore, a scale-free index had to be used. Two such indices were used 
hore, a reliability coefficient and a signal-to-noise ratio. 

Reliability coefficients for the school effectiveness indices estimated 
by <^ach of the five methods were computed by means of analysis of variance. 
Tlie variation of the school effectiveness indices estimated by a particxilar 
method for the two random halves can be divided into among-school variation 
and within-school variation. The expected mean squares are as follows: 



As indicated in V/iner (1962), the rel\ability of the mean of two observations 



Source of Variation 



Samples within* schools 



Schools 



Expected M.S. 

2 2 
cr samples + 2 cr schools 

2 

cr samples 



is estimated by 



cr schools 



Here, of course, the observations 



(j' schools + 



0" samples 



2 



v/ere the school effectiveness indices for the tv;o sainples. Each sample 
vms composed of 27 students on the average, but the number varied from 
school to school. The variation in sample sizes did not enter into the 
computations here. Thus, reliability estimates were based on an unweighted- 
means analysis of variance. 

For each method a signal-to-noise ratio was also computed^ It is simply 

2 

2 0" samples 
a schools divided by the estimated "noise," . This index 

furnishes another way to look at the stability of an estimator^ It is 

informative because, as Stanley (l97l) pointed out, an increase in the 

r?tdo is directly related to an increase in number of items (or number of 

student samples in this case). Thus, by dividing the sign&2-to-noise ratio 

of one measure by that of another, one can discover how many samples wovild 

have to be used in order to make the reliability of the two measures equal. 

Table 7 shov/s the variance components, slgnal-to-noise ratios, and 

reliability coefficients for the school effectiveness indices estimated 

by the five different methods. 

Insert Table 7 about here 



Discussion 

The Individual Residual School rrfcotivcncss Indices were the most 
stable across samples, having a rol J ability coefficient of .85 and a signsuL- 
to-noise ratio of ^•^l* The V/ithin-School i\egression (corrected and im- 
corrected) School Effectiveness lndi?es v;ere the least stable, particularly 
the school effectiveness indices for high-scoring students • The instability 
of the school effectiveness indices for high- and low-scoring students is to 
be expected, since under usual conditions data are limited in the extremes. 



rcr instance, under normal distribucio:: theory the variance of zhe V/ithin- 
S'/: ccl Kegres^iion School Ex fecti voness "index for a given school is a 
(Unction of {7^ - x)"" , v;herc >^ is the reference point for which the 
school effectiveness index is to be estimated, and X is the mean pretest 
score for the school (see, for cxpjnpl^*, Draper & Smith, 19^6, p. 22). The 
variance increases as (X^ - X)"" increases, and for most schools would 
yield larger variances for the school effectiveness indices at the extreme 
rciference point than for the school effectiveness indices at the middle 
reference point. It should be remembered that indices of school effective- 
ness fo"^ students of differing initial achievement were not computed for 
three of the metliods. Thus, the stability of the Individual Residual School 
Effectiveness Indices, Mean Difference Scores, and School hcsidual (Pretest) 
School Effectiveness Indices for hich- and lov^-scoring students is not 
knov/n. 

The school effectiveness indices for middle-scoring students computed 
fVom the wi thin-school regression lines were slightly less sta>^li ohan the 
Individual Residual Scnool Effectiveness Indices, Mean Difference Scores, 
and School Residual (Pretest) School Effectiveness Indices. Comparing the 
si(^nal-to-noise ratios indicates that it would take l.h (5»6l/5.92) student 
samples to make the reliability of V/i thin-School Regression School Effective- 
ness Indices (Middle-Scoring Students) equal to that of Individual Regression 
School Effectiveness Indices based on one student sample. 

It may be noted that the within-school regression corrected school 
effectiveness indices were less stable than their corresponding uncorrected 
school effectiveness indices, presumably because of the error involved in 
estimating reliability. Since it can be assumed that the corrected school 



effectiveness iiidices were l(»ss bia5*=^d, one is forceH t,o ^'hooS'» b(tv/eo!i a 
school ef fective)iess index t.hat is less biasoci and a school ef f»*t^*^ ivcn(\ss 
index that is more stable, 

Ifliile the extent of the bias in ^he various estimates is unknown, if 
it could be determined that tho other m^-thods of computing school effective- 
ness iridices were only slifjhtly more biased than the within-school regression 
school effectiveness indices, then they mipht be more useful as measures of 
school effectiveness because of their greater stability. 

In aiiy case, for this sample of schools, a31 of the school effective- 
ness indices except those i'or high-soorinp; students appear stable enough 
to warrant their use as measur es oV school o ectiveness . However, v/ith 
regard to stability the Individual Kosidual School Effectiveness Indices 
are the preferred ones. 

Sub-study III: Pr':*dio'. ioii of Sohot>l Kf feet ivtnivss indices 

Given that reasonably ^:ood estimates of si'hool effectiveness are avail- 
able from longitudinal data, it may bf pos^lblt' to predict the school 
effectiveness indices with a rLpsonabl(, ^ix'rr*'<- of accuracy from nonlongitudinal 
data that are readily available in many" schools . This possibility was in- 
vestigated in Sub -study HI. 

Procedures and Results 

The V/ithin-School Regression <"orroct^^d School Effectiveness Index 
(Middle-Scoring Students) was selected the measure of school effective- 
ness to be predicted from the state variables, arid hereafter is referred 
to as SEI'(m) . Because of the correction for the unreliability of the 
pretests this school effectiveness index was assumed to be less biased than 



the other school effectiveness indices for estimating overall school 
effectiveness. The variables used to predict SEI'(M) were Variables 
1-9, "18, arid two new variables. (See Table 1 for descriptions.) The two 
nev; predictor variables were the posttest mean and posttest standard 
deviation based on all students who took the posttest. These variables 
are the nonlongitudinal counterparts of Variables 1^ and 1^, which were 
based on those students who took both the pretest and the posttest, and 
are labeled 1^' and 15' in the remainder of this section. Such achievement 
data, based on all students rather than on a longitudinal sample, are often 
available in schools. Variable l8. Percent of Students Taking Both Tests, 
was included as a predictor variable even though it was based on test data 
taken at two points in time. As was indicated previously, this variable 
is a measure of the stability of the student body. Although a stability 
index in the form of Variable l8 might not be available in many schools, 
some index of the stability of the student body would usually be available. 

A forward-selection stepwise regression procedure was used to select 
the predictor variables to be included at each stage. In this process the 
regression of the variables incorporated into the model in previous stages 
was examined. Predictors were added until the amount of variance accounted 
for by any predictor left out of the model was less than .001. The results, 
are given in Table 8. Multiple Rs, standard errors of estimate, and F-tests 
are reported as well as unstandardized regression coefficients. 



Insert Table 8 about here 



Discussion 

Only three variables, Variables 1^*, ^, and 2, made a significant 
contribution to prediction at « = .05 (see the bottom line of Table 8), 
These v/ere Posttest Total Reading Mean, Pupil/Professional Instructional 
Staff Ratio, and K-12 District Current Operating Expense per Pupil 
(1969-70). The correlation of these three predictors with SEI'(m) was 
.79. The equation using all of the variables correlated .85 with SEI'(m) 
and accounted for 68^ of the variance, as opposed to 62^ for the three 
predictors. No validation of these results v/as attempted; but, if the 
v/eights derived from the full equation v/ere used on a new sample, the 
shrinkage that v;ould probably result makes it desirable to use the three- 
predictor equation. 

it is interesting to observe that the weight for a given variable 
changed very little as predictors were added. One would have expected 
that, as a result of the Increasing multicollinearity of the predictors, 
the regression weights would have boimced around. The squared mxiltiple 
correlation of any one of the predictor variables with the remaining pre- 
dictor variables is an indication of the collinearity of that variable 
with zhe others. In the nine-predictor subsot. Variables 8 and 9 had the 
highest squared multiple correlations (.87) with the other predictors (see 
Table 8). The highest squared multiple correlation o^^ any one predictor 
with the remaining predictors in the three-predictor subset was only ,06. 
Despite the increasing multicollinearity, the regression weights remained 
relatively stable. 

It is also interesting to note that Variable 5^ Percent of Non-White 
Students, was not among the nine predictors selected by zhe stepwise 
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rf-jgression program, even though its zero-order correlation with SEI'(m) 
v;as -.45 (see Table 6). If it had been entered as a tenth variable, it 
would have accounted for only 6$ of the variance associated with SEI'(M) . 
Thus, aljnost all of the SEI'(M) variance accounted for by Variable 5 was 
also associated with the other predictors. 

Assuming the validity of the Within-School Regression School Effective- 
ness Indices for middle-scoring students, it appears that a reasonable 
estimate of the effectiveness of a school for a given year can be made 
from the mean score of a school (from spring data), pupil/professional 
instructional staff ratio, and current operating expense per pupil. The 
regression weights obtained in this study would apply only if the same 
measures were used. However, the standardized regression weights may apply 
more generally. The standardized weights for Variables l4 ' , 4, 2, respec- 
tively, were .78, -.27, and .25, indicating that weighting the standard 
scores by 5, -1, and 1 woxild give an approximate indication of school 
effectiveness. The school effectiveness indices predicted froir. these 
three variables are plotted against SEI'(M) in Figure 

Insert Figure k about here 
Conclusions 

This study has shown that the five methods of estimating yield school 
effectiveness indices are highly correlated. However, they are different 
enough from one another and in their relations with other variables to 
prevent them i>om being used interchangeably. The school effectiveness 
indices for initially low- and high-scoring students appeared to yield 



unique information and raised doubts about using a simple index to measure 
school effectiveness. While differences existed among the school effective- 
ness indices, little evidence could be found for the lack of validity of 
any school effectiveness index. The methods should be tried out in a 
situation v;here the quality of schools is well known, so that a reasonable 
choice among the estimators can be made. 

The study has also shown that, except for the Within-School Regression 
(corrected and uncorrected) School Effectiveness Index for high scoring 
students, the various school effectiveness indices were highly stable. 
However, the stability was measured in terms of the sampling error associated 
with random halves. A more important kind of stability is bhe stability of 
the school effectiveness indices from one year to the next for schools whose 
physical facilities, staff, student body characteristics, and programs remain 
basically unchanged. Forsyth (1973) investigated the stability of high- 
school school effectiveness indices based on the residuals of twelfth grade 
means from predicted means based on ninth r.r^Ae data. He found correlations 
in the .20's for two differen:- longitudinal student samples. The extent to 
which the various school effectiveness indices included in this study are 
stable over years is uhknown and needs to be studied. 

A final conclusion is that over a one year period predictions from 
nonlongitudinal data furnished reasonable estimates of school 
effectiveness (r = .79), assuming the validity of the Within-School 
Regression School Effectiveness Index for middle-scoring students. The 
results here were not validated on a separate sample of schools. While 
the use of nonlongitudinal data as a substitute for school effectiveness 
indices based on longitudinal data seems promising, the method should not 
be used in a practical setting until further evidence is accumulated. 
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The conclusions from this study are limited in their generalizability 
in two respects: (a) The studj^ involved an accidental sample of third 
grade students. The students were in schools that had Title I reading 
programs and in comparison schools and were somewhat below average in 
reading achievement. The results may have differed if students of higher 
ability or at higher grade levels had been involved, (b) The study was 
limited to longitudinal data collected during one academic year. The 
methods shoxild be tried out in a situation where a two- or three-year 
interval exists between pretesting and posttesting and where data on more 
than one cohort are availa'^le. 
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■''The author wishes to acknowledge the assistance of Robert T. Patrick, 
who wrote the computer programs for analyzing the data. 
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Table 1 
Variables Used in the Study 

Variable No > Description 

1 K-12. District Instructional Expense per Pupil (1969-70) 

2 K-12 District Current Operating Expense per Pupil (1969-70) 

3 Percent of Non-White Students 

4 Pupil/Professional Instructional Staff Ratio (includes teachers, 
principals, librarians, counselors, etc. ) 

5 Pupil/Teacher Ratio 

6 Percent of Teachers with Five or More Years of Experience 

7 Percent of Teachers with Master's Degree 

8 Total Number of Students in School (Fourth Friday counts) 

9 Total Number of Third-Graders (Fourth Friday counts) 

10 State-Funded Compensatory Education Program? (Yes, No) 

11 State-Funded Remedial Reading Program? (Yes, No) 

12 Pretest Total Reading Mean 

13 Pretest Total Reading Standard Deviation 

14 Post test Total Reading Mean 

15 Post test Total Reading Standard Deviation 

16 Weekdays between Pretest and Post test 

17 Percent of Students Participating in Title I Reading 

18 Percent of Students Taking Both .Tests 

19 Within-School Regression SEI (Low-Scoring Students) 

20 Within-School Regression .1 (Middle-Scoring Students) 

21 Within-School Regression S..i- (High-Scoring Students) 

22 Within-School Regression Standard Error of Estimate 

23 Within-School Regression Corrected SEI (Low-Scoring Students) 

24 Within-School Regression Corrected SEI (Middle-Scoring Students) 

25 Within-School Regression Corrected SEI (High-Scoring Students) 

26 Individual Residual SEI 

27 Individual Residual Standard Deviation 

28 Mean Difference Score 

29 Mean Difference Standard Deviation 

30 School Residual SEI (Pretest) 
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Table 2 

Within-School Repression Coefficients and Other Information for 
Schools That Had the Highest and Lowest School Effectiveness 
Indices for High-, Middle-, and Low-Scorine Students* 



High-Scoring Middle-Scoring Low-Scoring 

Students Students Students 



1 tem 


Highest 


Lowest 


Highest 
SEI 


Lowest 
SEI 


Highest 


Loves t 


School Code 


082 


152 


082 


152 


192 


04^ 


No. of Students 


28 


39 






46 


I/O 

143 


High Pretest Score 


61 


65 






9A 


81 


Low Pretest Score 


38 


8 






36 


13 


Pretest Mean 


47.5 


42.7 




! 


59.5 


52. 2 


Pretest SD (N) 


6.9 


n.3 


Same 


11.9 


9.6 


Post test Mean 


58.8 


46.5 


as 


for 


67.8 


55.2 


Posttest SD (N) 


11.1 


9.1 


High- 


9.5 


10.3 


Pretest-Posttest 
Correlation 

Estimated Pretest 


0.90 


C.78 


Scoring 
Students 


0.59 


0.86 


Reliability 


0.92 


0.97 






0.98 


0.96 


Intercept (uncorrected) 


-9.65 


16.92 






39.99 


6.93 


Slope (uncorrected) 


1.44 


0.69 






0.47 


0.92 


Intercept (corrected) 


-15.19 


15.89 






39.26 


4.96 


Slope (corrected) 


1.56 


0.72 






0.48 


0.96 


SEI (uncorrected) 


80.8 


60.5 


65.0 


52.9 


59.1 


44.7- 


SEI (corrected) 


82.6 


61.0 


65.6 


53.1 


58.9 


44.3 


The reference points 


were 62. 


7510, 51.8201, and 


40.8892 


for high- 


> 



middle-, and low-scoring students, respectively. 



Table 3 



Regression Coefficients and Other Information on the 
Individual Residual and School Residual Models 



Model 



Item 



Response Variable 

Predictor Variable 

No. of Observations 

Predictor Mean 

Predictor SD 

Response Mean 

Response SD 

Predictor-Response 
Correlation 

Intercept 

Slope 

Standard Error of 
Estimate 



Individual 
Residuals 

Posttest 
Score 

Pretest 
Score 

3769 

51.82 

10.93 

58.34 

11.27 

0.80 
15.60 
0.82 

6.76 



School Residuals 
(Pretest) 

School Mean Post* 
test Score 

School Mean Pre-> 
test Score 

70 

51.82 

9.87 

58.54 

10.14 

0.90 
8.98 
0.96 

2.14 
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Table 5 

T)erived Minres Factors for School Effectiveness Indices 

(Nortoal Varlmax) 



Factor 



Variable 




I 


II 


III 


hi 










28 


1 QQD 

▲ . VWw 


2 • W/ Scnl Reg . (n j 




• 0/ 


• DO 


. J5 


l.UOU 


3. W/Schl Reg. (H) 




.13 


.95 


.28 


1.000 


A. W/Schi Reg. Corr. 


(L) 


.95 


.09 


.28 


.997 


5. W/Schl Reg. Corr. 


(M) 


.65 


.66 


.37 


.998 


6. W/SchI Reg. Corr. 


(H) 


.08 


.95 


.30 


.995 


7. Ind. Residuals 




.58 


.53 


.59 


.969 


8. Mean Differences 




.AO 


.42 


.81 


.994 


9. School Residuals 




.45 


.46 


.77 


1.000 


Factor Variance 




3.A2 


3.36 


2.17 




% Factor Variance 




38.2 


37.5 


24.2 




% Total Variance 




■jb.U 


37.3 


24.1 




Eigenvalue 




6.93 


1.54 


0.48 
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Figure Captions 

Fig. !• Within-school regression lines for three hypothetical 
schools . 

Fig. 2. Comparison of ^'true*' and observed regression lines. 

Fig. 3. Plot of school effectiveness indices for middle-scoring 
students from corrected within-school regression and school effectiveness 
indices from total individual residuals . 

Fig. Plot of school effectiveness indices for middle-scoring 
students from corrected within-school regression and school effectiveness 
indices predicted from three variables. 
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